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Abstract 

Balanced allocation of online balls-into-bins has long been an active area of research for efficient 
load balancing and hashing applications. There exists a large number of results in this domain for 
different settings, such as parallel allocations |1|, multi-dimensional allocations |5|, weighted balls ||4l 
etc. For sequential multi-choice allocation, where ni balls are thrown into n bins with each ball choosing 
d (constant) bins independently uniformly at random, the maximum load of a bin is 0(log log n) + m/n 
with high probability [3]. This offers the current best known allocation scheme. However, for d = 
0(log7i), the gap reduces to 0(1) IfTDl . A similar constant gap bound has been established for parallel 
allocations with 0(log* n) communication rounds 1 14|. 

In this paper we propose a novel multi-choice allocation algorithm. Improved D-choice with Es- 
timated Average {IDEA) achieving a constant gap with a high probability for the sequential single- 
dimensional online allocation problem with constant d. We achieve a maximum load of [m/n] with 
high probability for constant d choice scheme with expected constant number of retries or rounds per 
ball. We also show that the bound holds even for an arbitrary large number of balls, m >> n. Further, 
we generalize this result to (i) the weighted case, where balls have weights drawn from an arbitrary 
weight distribution with finite variance, (ii) multi-dimensional setting, where balls have D dimensions 
with / randomly and uniformly chosen filled dimension for m — n, and (iii) the parallel case, where n 
balls arrive and are placed parallely in the bins. We show that the gap in these case is also a constant 
w.h.p. (independent of m) for constant value of d with expected constant number of retries per ball. 
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1 Introduction 



A central research area in the domain of randomized algorithms is the occupancy problem for balls-into- 
bins processes |l2j[8j[3j[T4j[T6l. The framework of the problem involves the analysis of the online allocation, 
wherein a set of independent balls is to be assigned to a set of bins. The occupancy problem helps to model 
several realistic problems into a formal mathematical structure, and hence opens an active area of work in 
probability theory as well as in computer science. 

In the classical "balls-into-bins" problem, m balls are sequentially thrown into n bins, where each ball 
is placed into one of the bins independently and uniformly at random (i.u.r.). The natural question then is 
to analyze the maximum load in any of the bins. Mapping the problem to the application domain, we may 
consider the balls to be jobs or tasks and the bins to be servers. The problem then reduces to scheduling the 
jobs with balanced load allocations among the servers. 

Probably one of the earliest applications of randomized load balancing is in the context of hashing. 
For the chaining method during hash clash, the length of the lists in the hash buckets are a measure of the 
retrieval complexity. For a uniform hash function, the length of the lists follow the same distribution as the 
number of balls in a bin in this case. 

The advent of parallel and distributed systems required efficient online load balancing among the servers 
to improve the throughput of the system. Dependence on a centralized environment for uniform load bal- 
ancing is highly undesirable for such systems due to high communication complexity. With the introduction 
of the Cloud Computing paradigm, the placement of virtual machines (VMs) on servers provided a new 
dimension to the applicability of the randomized balanced allocation study. 

Other applications such as the design of Multimedia or Data Servers use disk arrays where a data 
unit is partitioned and stored in a distributed fashion. These applications demand even (balanced) ac- 
cess of the disks on retrieval |[T9l and Karp in lfT3l discusses applications in video-on-demand (termed 
k-orientability HI). The balls into bins problem accurately describes these applications only when the balls 
have uniform weights. Other applications assume the loads to be of different weights to model its various 
dimensions. 

This paper tackles the problem of sequential online allocation of balls into bins. Assuming we have n 
bins and m balls arriving one at a time are to be thrown into these bins, the problem is to devise an efficient 
algorithm such that the allocation of the balls is nearly balanced among all the bins. In formal terms, the 
load in each of the bins should be as close to the average, (m/n) as possible. We initially study the case 
of single-dimensional sequential placement of uniform weighted balls into bins problem and then extend 
it for the general weighted case. Finally we also observe that IDEA provides the same result w.h.p. for 
multi-dimensional balls-into-bins problem for m = n. 

In this context, we define Gap to be the difference between the heaviest loaded bin and the average load. 
The currently best known algorithm bounds Gap to 0(log log n) with high probability using the symmetric 
d-choice placement strategy El [HI. In the d-choice method, each ball selects d bins i.u.r. among the n bins 
and is allocated to the least loaded bin among them. It is well-known that if d = ©(log n) choice, the gap is 

o(i) m. 

In this paper we propose a novel algorithm. Improved D-choice with Estimated Average, (IDEA) for 
efficient placement of the balls in the bins. We prove that this technique provides a constant Gap with 
high probability (w.h.p.) even when d is kept constant, albeit with an expected constant number of retries 
or rounds per ball. We further extend the result to show that the guarantee also holds true for the heavily 
loaded case, i.e. m » n w.h.p. Our technique is different from the typical greedy d-choice process in that 
it places the ball in the bin that has load equal-to or lower than the estimated average of that bin. Using 
expected constant number of retries such a bin can be found for each ball and hence the load in each bin 
tends towards the estimated average which also tends towards the actual average, resulting in constant upper 
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bound on the gap. Our strategy is also different from the typical asymmetric strategy ||22]| where in case of 
tie over the load, the leftmost bin gets the ball. Our result can have profound implication both theoretically 
and practically on the online load balancing algorithms. 

The outline of the paper is as follows: Section |2]presents an introduction to the known works and results 
in this domain. In Section[3]we propose the detailed outline of the IDEA algorithm for allocating the balls 
into the bins. Section |4]provides the theoretical proof for bounding the Gap to a constant quantity with high 
probability. Section |5] provides insights into the execution of the IDEA algorithm. Section \67\] depicts its 
extension for the general weighted balls case, Section ld!2] exhibits similar results for the multi-dimensional 
scenario, and Section 16.31 proposes the protocol for achieving the same results for the parallel scenario. 
Finally, Section |7] concludes the paper. 

2 Related Work 

The study of "balls-into-bins" problem dates back to the study of hashing by Gonnet. He showed that when 
n balls are thrown into n bins i.u.r., the fullest bin has an expected load of (1 + o(l)) log n/ log log n |[T2l . 
The maximum loaded bin in this approach was shown to be O (log n/ log log n) w.h.p. f9\. It was also shown 
that for m > n log n balls, a bin can have a maximum load of m/n + ©( y^m log n/n). 

Azar et al. |2| showed that if the balls chose sequentially from d > 2 bins i.u.r. (called Greedyfd] 
algorithm) and greedily selected the bin currently with the lowest load, the Gap could be bounded by 
0(log log n/ log d) w.h.p. However, the solution worked only for the case when m = n. They also showed 
that the bound is stochastically optimal, i.e. any other greedy approach using the placement information of 
the previous balls to place the current ball majorizes to their approach. However, if the alternatives are drawn 
from separate groups with different rules for tie breaking, it results in different allocations. ll22ll presents such 
an asymmetric strategy and using witness tree based analysis proves that this leads to an improvement in 
the load balance to 0(^|^^^) w.h.p. where, (f)2 is the golden ratio and 4>d is a simple generalization. 
Our algorithm is different from both these techniques in that it uses the estimated gap as the criterion for 
choosing the bin and makes potentially multiple retries, where in each retry d bins are chosen i.u.r. 

For the heavily loaded case, m » n, the bound of 0(log log n/ log d) w.h.p. was later proven in f3l 
using sophisticated techniques in two main high level steps. In the first step, they show that when the number 
of balls is polynomially bounded by the number of bins the gap can be bounded by 0(ln ln(n)), using the 
concept of layered induction and some additional tricks. In particular, they consider the entire distribution 
of the bins in the analysis (while in typical m = 0{n) case the bins with load smaller than the average 
could be ignored). In the second step, they extend this result to general m » n case, by showing that 
the multiple-choice processes are fundamentally different from the classical single-choice process in that 
they have short memory. This property states that given some initial configuration with gap A, after adding 
poly{n) more balls the initial configuration i?, forgotten. The proof of the short memory property is done by 
analyzing the mixing time of the underlying Markov chain describing the load distribution of the bins. The 
study of the mixing time is via a new variant of the coupling method (called neighboring coupling). It was 
also shown that when d = 0(log n) the gap becomes 0(1) lITTI . 

Cole et al. fT| showed that the two-choice paradigm can be applied effectively in a different context, 
namely, that of routing virtual circuits in interconnection networks with low congestion. They showed how 
to incorporate the two-choice approach to a well-studied paradigm due to Valiant for routing virtual circuits 
to achieve significantly lower congestion. 

Kunal et.al. f20| prove that for weighted balls (weight distribution with finite fourth moment) and m» 
n, the expected gap is independent of the number of balls and is less than vf^, where c depends on the weight 
distribution. They first prove the weak gap theorem which says that w.h.p Gap{t) < t^^^. Since in the 
weighted case the d choice process is not dominated by the one choice process, they prove the weak gap 
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theorem via a potential function argument. Tiien, the short memory theorem is proved. While in [31 the 
short memory theorem is proven via coupling, [1201 uses similar coupling arguments but defines a different 
distance function and use a sophisticated argument to show that the coupling converges. 

The (1 + /3)-choice scheme [ 17] proved that if a ball chooses with /3 E (0, 1) probability the least loaded 
bin of d = 2 randomly chosen bin, and otherwise i.u.r. a single bin, the Gap becomes independent of m and 
is given by 0(log n//3). 

In the parallel setting, |fT4l showed that a constant bound on the gap is possible with 0(log* n) commu- 
nication rounds. Adler et.al. [1] consider parallel balls and bins with multiple rounds. They present analysis 
for Q( ^°fo'g(^^"^ ) bound on the gap (for m = 0{n)) using Q( ^°fo'g(^^"^ + rounds of communication. 

For offline balls-into-bins problem, using maximum flow computations it was shown that the maximum 
load of a bin w.h.p. is \m/n \ + 1. [8| showed that for m > cnlogn balls, where c is a sufficiently large 
constant, a perfect distribution of the balls was possible w.h.p. However, no such similar result is found in 
the literature for the online sequential case for constant d choice. 

Mitzenmacher et. al. in [5 ] addresses both the single choice and d-choice paradigm for multidimensional 
balls and bins under the assumption that the balls are uniform D-dimensional (0, 1) vectors, where each ball 
has exactly / populated dimensions. They show that the gap for multidimensional balls and bins, using the 
two-choice process, is bounded by 0(log log(nD)). We provide a better bound of 0(1) w.h.p. for m = n 
case. 

In this paper, we study a novel online sequential allocation algorithm for balls-into-bins based on a 
constant d-choice strategy and prove a constant gap bound both for m = re and the heavily loaded case 
m » n along with for the general weighted balls and multi-dimensional scenario. 

3 The IDEA Algorithm 

In this section we discuss the execution of the Improved D-choice with Estimated Average (IDEA) algo- 
rithm. We consider there are re bins and m balls which arrive in an online fashion. We initially assume 
that the balls are of uniform weights and are numbered according to the order of their arrival. In hashing 
applications, the number of the balls based on their arrival order plays no role in assisting better or faster 
retrieval. Hence, this assumption does not decrease the complexity of the problem at hand. Later we also 
provide a blueprint of the case when such a numbering of the balls in not allowed and the weighted balls 
case with the weights of the balls drawn from an arbitrary distribution with finite variance. 

Given each bin has an accurate knowledge of the average number of balls in the system, m/re it is easy 
to distribute the balls so as to obtain a perfectly balanced allocation. IDEA operates on the above principle, 
where each bin independently calculates a fairly good estimate of the current average number of balls in the 
system. Each bin is then loaded nearly equal to its estimated average value. In the remainder of this section 
we show how each bin independently estimates its average which we later prove, with a high probability, to 
be very close to the actual average, rei,/re. We also show that each bin is then loaded close to its estimated 
average value, giving a maximum load of [m/re] with a constant gap allocation w.h.p. 

The IDEA algorithm initially works as in the d-choice algorithm. On arrival of a ball bj, it i.u.r. chooses 
d bins {d is constant) as its possible candidates for placement. Each bin, Bi,i € [1, re] is characterised by 

two parameters: (i) Current Load, L^, and (ii) Current Estimated Average, Al. For each bin we define its 

estimated gap, Gapf as the difference between its cun^ent load and its current estimated average. Formally, 

Gapi = L{-Ai. 

The ball bj is then allocated to the bin having the lowest value of Gap^- among the d chosen bins. Given 
the definition of Gap (in Section [Til we would like to place the ball in a bin with negative or zero Gap. This 
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Algorithm 1: IDEA Algorithm 



Require: Number of bins (n), Number of balls (m) and Maximum iteration (7) 
Ensure: Balanced Allocation of Balls-into-Bins 

for all bin Bi, i G [1, n] do 

Initialize the load, L 5. and estimated average, Ab^ to 
end for 

for all ball bj, j G [1, m] do 

loop ^ 

while loop < 7 do 

Choose d bins, C = {Birii, Bin2, ■ ■ ■ Bind] i-u.r. from the n bins 

if set C contains at least one bin with negative or zero estimated gap. Gap sim = L Bim — A Bi: 
then 

Break while 
end if 

loop ^ loop + 1 
end while 

Place ball bj in the bin, B £ C having the lowest estimated gap. Gap b 
Lb ^ Lb + 1 
for all bins, Biui G C do 
if A Bin, > Ij/n] then 

flag ^ 1 
else 

flag ^ 
end if 

if flag = then 

A Bim ^ ^Birii + 

end if 
end for 
end for 



would ensure that the loads in the bins be close to their estimated average values and thus lead to a lower 
Gap. Hence, if in the d choice a ball selects no bin with negative or zero Gapi, it re-chooses its candidate 
d bins. To boost the probability of a ball choosing a bin having such Gapi, this re-choosing will be carried 
out 7 times, where 7 will later be shown to be approximately a constant. 

The current estimated average for each of the d bins finally selected by the ball is then incremented by 
1/d. In the next paragraph we discuss the selection of such an increment value. We intuitively argue that 
for each bin if Ai is finally close to the actual average (m/n) w.h.p., and its load Li is nearly equal to its 
estimated average, the overall Gap in the system will be minimized and the maximum load of a bin will be 
\m/n]. The pseudo-code of IDEA algorithm is shown in Algorithm [J 

The probability that a bin is chosen by a ball in its d choice is given by d/n. So when n balls arrive a bin 
will be chosen d times on expectation. For each such choice the estimated average of the bin is incremented 
by 1/d (Algorithm [T]). Hence, its final estimated average will be 1, which is indeed the actual average of 
the system. However, from Lemm£[T] we observe that a bin might be chosen d log n times or lesser w.h.p. 
Since we increase the estimated average by l/d, the estimated average may increase beyond 1 in such cases. 
Hence the estimated average of a bin may be greater that 1 in two situations: 
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(i) Not more than n balls have arrived, but the bin has been chosen close to d log n times, or 

(ii) More than n balls have arrived. 

For case (i), the estimated average of the bin should still remain 1, while in the other case, the estimated 
average should be increased as usual. It is here that the numbering of the balls come into effect. If the 
estimated average of a bin goes beyond 1 and the next ball which selects this bin has a number less than n, 
the bin knows that it may be chosen d log n times and hence refrains from increasing its estimated average 
until a ball with number more than n selects it. Similarly when the estimated average of a bin increases 
beyond a, a G N, it checks if the next ball selecting it has a number greater than an. Thus the balls 
communicate their numbers as well while choosing the d candidate bins. 

However in the scenario where numbering of the balls is forbidden, to differentiate between the two 
cases, we use the sampling technique among the bins. A bin with estimated average just above a, in this 
case chooses log n bins i.u.r. and communicates with them for their estimated average. If the average of 
the estimated averages of the sampled bins is less than 1, the bin comprehends that case (i) has happened, 
i.e., it is receiving more than d balls out of n balls and thus refrains from increasing its estimated average. 
However, if the average of the estimated averages are 1, the bin decides that more than an balls are arriving 
and increases its estimated value as usual. The probability that the error in the sampled average is greater 
than e, a small constant, is given by ^ for constant number of samples when m > n log n and by log n 
sampled choice for m < n log n scenario {sampling theorem). Hence w.h.p. of 1 — ;| we obtain the right 
decision for each bin. In Appendix|A]we discuss in detail the proof for this claim, and also show that the total 
number of such sampling done is less than communication done if d = log n. More intelligent sampling 
methods as that of Reservoir Sampling 11211 . Subset-Sum Sampling |[T0l l6l or a combination of Sampling and 
Sketching lITSlfTSl may be used to obtain a better estimates. The study and effects of such methods are not 
discussed as a part of this paper. 

Hence, we find that IDEA dynamically adapts its estimated average to be closer to the actual average 
of the system. In either case, the estimated average of a bin is increased by at most 1 for every n balls. 

4 Theoretical Framework 

In this section, we provide a theoretical proof of the constant gap performance of the IDEA algorithm. 
First, we bound the number of balls that may select each bin. We then establish that each ball in the IDEA 
algorithm chooses at least one bin having negative Gap with a high probability, which makes the load of 
each bin converge to its estimated average value. Finally, we bound the Gap of the system to a constant 
value w.h.p. We assume m balls to arrive in an online fashion and there are n bins. 

Lemma 1. If each ball chooses d bins i.u.r out of n bins, each bin is chosen by ^ balls on expectation, 
and by at most ^ log n balls with high probability. 

Proof. Define Yi,Y2, - • • Ym to be indicator random variables corresponding to balls 6i, 62, • ' ' i respec- 
tively. Let Yi = 1 represent the event that the ball bi chose bin B as one of its d candidate bins, otherwise 
Yi = 0, Vi € [1, Since the balls choose d bins i.u.r., the probability that bin B is chosen among the d 
bins, or Pr{Yi) = 1, is given by d/n. Let X be a random variable depicting the number of balls that chose 
B among its d candidate bins. Hence, X = Yl^i ^i- The expected value of X is, 

E\X] = £;[V 1^1 = V E\Yi\ = y - = — [By Linearity of Expectation] (1) 

Z / Z / Z / ^ yj 
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Applying Chernoff 's bound on X we obtain, 

P{X > (1 + S)E[X]) < 



,;P(X>il + 6) — )< ^^j— 

Substituting 5 = log n — 1 we have, 

md, , e'°s"-i n 

P{X> logn <- = — (2) 

^ n ' (logn)i°g" e(logn)i°g" 

Let y = (logn)'°8". Hence, logy — log n log log n. We have, 

=^ \og{y/n) — log n (log log n — 1) — log n (log log n — log log e*^) 

For large values of n, log log(ri/e'^) > 1, giving \og{y/n) > log n. Therefore, we have y > n^. 
Substituting in Eq. Q, 

P{X >—\ogn) <— (3) 
n en 

Hence, bin B is chosen by at most ^ log n balls with a high probability of 1 — ^ . □ 

Lemma 2. At any iteration, the estimated average of each bin is approximately equal to the current average 
with high probability. 

Proof. We assume here that Z balls have already arrived and have been placed among the n bins. The 
number of balls that chose bin B among its d candidates is ^ on expectation, since each bin can be chosen 

by a ball with a probability of ^. The number of such balls is also bounded by ^ log n with high probability 
(by Lemma [T]l- However, a bin does not increment its estimated average by more than d times for every n 
balls. For each choice the bin B increases its estimated average by ^. Hence the current value of is 
given by, 

Zd I Z , . , 

Ab = — • — = — , which IS the current average. 

n d n 

Hence, the estimated average A of any bin is nearly equal to the actual average w.h.p. □ 
Observation 1. The variance of the estimated average of a bin B for n balls is, 

X 1 " 1 " 

1 d, d. 1 I — 
— ) = ~; [From Lemma\lj 

d'^ n n d n 

Lemma 3. The amortized sum of the estimated gap. Gap over all the bins is zero after every n balls. 

Proof. Each ball chooses d candidate bins i.u.r. and is finally allocated to the bin having the least estimated 
gap. Hence for all the d chosen bins, their estimated average is increased by \/d. The bin which receives 
the ball witness an increase in its actual load by 1. Hence, overall its estimated gap increases by 1 — 
However, for the remaining d — 1 bins their loads remain the same, and thus their estimated gap decreases 
by 1/d. Hence the overall change in estimated gap over the d chosen bins is 1 — l/d + (d — 1)(— l/d) = 0. 
Initially, since the sum of the estimated gaps of the bins was 0, the lemma holds. 

Considering a batch of n balls arriving in the system, a bin may be selected more than d times (Lemma[T|). 
In such case, the bin samples other bins for their current estimated average value, and depending on it may or 
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may not increase its estimated average as discussed in Section|3l As such the change in the overall estimated 
gaps in this round will not add up to 0. Such a scenario occurs when a bin is selected more than d times in 
the batch of n balls. Such a bin may not increase its estimated average, and IDEA experiences a positive 
change in the overall estimated gap of the system for such a round. 

However, it can be observed that for a batch of n balls, the total number of bins that are selected by the 
balls is exactly nd. Since we consider a bin to have been selected more than d times, there exists at least 
one bin which was selected less than d times. Assume a bank to exist, which loans a unit credit to the bin, 
selected more than d times for n balls, per extra selection. If such a bin is selected d + c times over a period 
of n incoming balls, the total credit units in the bank is exactly c. However, since the number of selections 
are fixed, the total holes in the system will also be exactly be equal to c. Hole in a bin refers to the difference 
of d and the number of times the bin has been selected by n balls, for bins selected less than d times. Each 
such bin can be considered to have extra unit credit points per hole, which it returns to the bank after n balls 
have been allocated to the system. Since the number of credits in the bank is exactly equal to the number of 
extra credits held by the bins in the system, after n balls the total credit points of the bank will be 0. 

It can easily be observed that the total credits in the system is always a non-negative quantity. Since the 
bins are chosen by the balls i.u.r., all the bins are selected nearly the same number of times over a period of n 
balls, no bins tends to accumulate a large quantity of extra credits that it always keeps returning to the bank. 
This factor helps to maintain the estimated average of each bin close to the actual average of the system. 
Hence, combining both the settings, we prove that on an amortized notion, the sum of the estimated gap in 
all the bins is after every n balls. □ 

Corollary 1. The sum of the estimated gap over all bins is zero for arbitrary small number of balls allocated 
in the system. 

Proof. Let the number of balls being allocated in the system be a function of n, f{n). Given the constraint 
that the value of f{n) is not a constant, the arguments of Lemma [3] still holds true. Consider, /(n) = rf, 
where e is arbitrary small respecting the constraint that /(n) is not a constant. Thus, the sum of the estimated 
gap in the system is after /(n) balls have been allocated to the bins. □ 

Lemma 4. The number of bins having a zero or negative estimated gap, Gap is 0(n). 

Proof. In Lemma |3] and Cor. [T] we show that the sum of the estimated gap of the bins is even when 
arbitrarily small number of balls are allocated to the bins. As such the number of bins with positive estimated 
gap cannot increase by more than rf. 

Let there be a bins with positive Gap, (3 bins with negative estimated gap, and 9 bins having estimated 
gap. Hence, a + fi + = n. We would like to establish a lower bound on /3 + 6*. In order to have minimum 
number of bins with negative or zero Gap, the value of the gap should be minimum for bins with a positive 
gap and maximum for bins with a negative gap. The minimum positive estimated gap for a bin is Z(l — 
when Z{d — 1) balls have arrived in the system, of which only Z balls have been committed into the bin. 

The maximum negative estimated average that a bin may have in this case is — Hence, 

a.{Z{l - ^— i)) + 11) +0.0 = [FromLemmaia 

a a 

a = li{d-l) 

As a + /3 + ^ = n, we have dj3 + 9 = n. Hence, the number of bins with zero or negative Gap is Q{n). 

For each round of /(n) balls, the number of bins with zero or negative estimated gap may decrease by 
f{n). Consider that in round k, the number of bins with zero or negative gap is N{ck). In the [k + 1)*'* 
round, the number of such bins may become N{ck) — f{n). However, as f{n) is considered to be very 
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small, in the order notation the number of such bins still remains 6(n). We contradict the existence of 
any additive influence of f{n) per round by the argument of amortized analysis in the above lemma and its 
corresponding corollary. □ 

Lemma 5. Each ball chooses at least one bin having negative estimated gap among its d choices w.h.p. in 
7 rounds. 

Proof. Each ball selects independently and uniformly at random d candidate bins for its placement among 
the n bins. Hence the probability that bin Bi is chosen as a candidate for ball bj is, P/ = (^Z})/(d) = i- 
Let there be c bins with zero or negative Gap. The probability that neither of these bins are selected as 
candidate by a ball = ("^^) / (^). The ball may re-select its candidates at most 7 times. Therefore, the 
probability that neither of the cbins ai^e selected in any of the 7 tries = (("^'^) / (rf))^- Hence the probability 
that at least one bin with negative Gap is selected in the 7 iteration is given by, 

P(at least one selected) = 1—1 j « 1 — [Assuming c = 7i/2 from Lemma|4l (4) 

For d = 2 and 7 = 2, we obtain a probability of around 0.94. However, with 7 = logn, the probability 
becomes nearly 1 — ;^ ■ Further, we can show that approximately constant number of retries suffice. 

Let the number of bins with positive gap at any point of time be n^~'^, where < e < 1. The probability 
Pbneg with which a bin with a zero or negative gap is chosen in 7 iterations is given by. 

Pi 



For a zero or a negative bin to be chosen with a high probability, we need Phneg > 1 — where (f) > Q. 
Hence for (^1 - {'^Y^'^ >l - ^. Thus, 7 > ^. Hence, at least one such bin is chosen by each ball in 
approximately constant 7 re-polls or rounds per ball w.h.p. □ 

In the next lemma, we show that in practice only a couple of retries are needed to get a bin with zero or 
negative estimated gap. 

Lemma 6. The expected number of rounds, 7 per ball to find a bin with zero or negative estimated gap is 
constant. 

Proof Let pi denote the probability that we find a zero or a negative bin at iteration i. Therefore, we have 



Pi 



W_ Ppos ) • Pneg " H 9^ ' ( ^ Od 



2d \ 2<^ 2«'' 

1 



where Ppos is the probability of selecting a bin with a positive estimated gap and Pmg is the probability of 
selecting a bin with a zero or negative gap. The expected number of rounds per ball, 7 to find a zero or a 
negative gap is given by, 

m = Y.^p.-{^'-i)±^, (5) 

i=l 



Let, 

^(0 - E (6) 



log n 

d 

I 



2^- 
1=1 



o(t+l)d 



2d ^ 2(«+i)'^ 

i=l 
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Subtracting Eq. (|7]) from Eq. we have 



(2'i - 1)^ 



2'^ 



Substituting Eq. (O in Eq. (|5]), we have 



i?[7]«l + ^ (9) 
^ £;[7] < 2 

Given the number of bins having negative of zero estimated gap to always remain G(n), the number of 
retries per balls remains constant throughout the execution of the IDEA algorithm. □ 

Lemma 7. The load of each bin tends to its estimated average. 

Proof. IDEA places each ball into a bin with zero or negative Gap, with high probability 1 — (LemmaO 
using 7 retries. When a ball is placed in a bin, its Gap increases. Thus, the probability that this bin will again 
get a ball lowers. On the other hand, the bins that had been chosen but the ball was not placed in them have 
a decrease in their estimated gap. Hence, the probability that a ball is placed in them increases. So, a bin 
with a negative or zero Gap has a higher probability of a ball being allocated to it, whereby its estimated gap 
tends towards (in case of negative estimated gap-ed bins). On the other hand, bins with positive estimated 
gap receive a ball with low probability even when chosen as candidates, and their estimated gap decreases 
towards 0. Hence, we observe that the estimated gap of any bin tends towards 0. Since, estimated gap is 
the difference of the load and the estimated average of a bin and the gap tends to zero, the load of the bins 
becomes nearly equal to their estimated average w.h.p. □ 

Theorem 1. The maximum load in any bin is \ra/n\ + 0(1) w.h.p using the IDEA allocation algorithm 
for the sequential, on-line and unweighted balls-into-bins problem. 

Proof. Using the above lemmas we observe that the estimated average of each bin finally becomes \m/n\ 
and the load in each bin is equal to its estimated average w.h.p. Hence the maximum load in any bin is 
\m/n\ + G(l) w.h.p. □ 

Corollary 2. The IDEA algorithm provides a perfectly balanced allocation with constant gap. 

Proof. Since the maximum loaded bin has a load of \m/ri\ + 6(1) w.h.p. (Theorem[T|l, the Gap is of 6(1) 
providing a perfectly balanced allocation for the balls-into-bins problem with constant gap. □ 



5 Discussion 

We note that the Greedy [d] algorithm can also retry 7 times to find a bin of even lower total number of 
balls that what it could do in a single round. Still, the distribution of the balls in bins will be different than 
the IDEA algorithm because the IDEA algorithm explicitly uses the expected gap to make the decision of 
where the ball is placed. The key question is can the Greedy [d] algorithm give a constant gap and the answer 
is negative for a single retry because of the well known lower bound of 0(lnln(n)) [2|, while for multiple 
retries 7 has to be Q{log{n)) 111] to achieve a constant gap. IDEA however requires only constant (< 2) 
retries in the expectation (Lemma [S]), to achieve the constant gap. Further, it requires 7 = ^ retries with 
high probability (Lemma [5]). 
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A bin, B is chosen by d balls among n balls on expectation. However, the bin may be chosen ad times, 
< Q < 1 among the first p balls that arrive. As such, the Greedy [d] choice algorithm will place the 
balls in empty or lesser loaded bins if available. In the remaining balls, B is chosen (1 — a)d times. Now, 
for large values of a, even if all these balls are placed in it, B will have a load far less than the average of 
the system. So the Gap increases. However, for IDEA with large a values, the estimated average for B 
will be large and hence its estimated gap will be significantly lower than the other bins. So, it has a higher 
probability of a ball being allocated to it. Thus, when the remaining balls arrive and a small fraction of them 
are placed in B, its load will still be closer to the actual average as compared to the d-choice algorithm. This 
sensitivity towards skewness in the random choices also enables IDEA to arrive at a better allocation than 
the d-choice. 

6 Extended Framework 

6.1 Weighted Case 

In this section we consider the weighted case of the balls-into-bins problem where the balls have weights 
drawn from a distribution x with an expected weight W*, such that the weight of any ball W has a finite 
variance and can be bounded by (W* — k) <W < {W* + k), where A; is a constant. We apply the IDEA 
algorithm and show that the gap is also constant w.h.p. in such scenarios. 

Theorem 2. The maximum load in any bin is W*{\m/n \ + 0(1)) w.h.p using the IDEA allocation algo- 
rithm for the sequential, on-line and weighted balls-into-bins problem. 

Proof. Reworking the lemmas stated in Section |4] we observe that the estimated average of each bin con- 
verges to W* \m/n~\ and that the load in each bin tends to its estimated average w.h.p. Hence the maximum 
load in any bin is given by [m/n] + 0(1)) w.h.p. The complete proofs of the lemmas for the weighted 
case is provided in Appendix |B] □ 

Corollary 3. The IDEA algorithm provides a perfectly balanced weighted allocation with constant gap 
even for the general weighted case of the Balls-into-bins problem. 

Proof. From Theorem |2] we observe that as the maximum load is iy*([m/n] + 0(1)). Hence IDEA 
provides a perfectly balanced allocation for the weighted case w.h.p. having a constant gap of W*Q{\). □ 

6.2 Multi-Dimensional Case 

In this section, we consider the multidimensional (md), variant of the balls and bins problem. One multidi- 
mensional variant, proposed by [5| is as follows: Consider throwing m balls into n bins, where each ball is 
a uniform D-dimensional (0-1) vector of weight /. Here, each ball has exactly / non-zero entries chosen 
uniformly among all (^) possibilities. The average load in each dimension for each bin is given as mf /nD. 

Let l{a,h) be the load in the dimension a for the 6*'^ bin. The gap in a dimension (across the bins) 
is given by gap{a) = maxbl{a, b)avg{a), where avg{a) is the average load in the dimension a. The 
maximum gap across all the dimensions, maxagap{a), then determines the load balance across all the 
bins and the dimensions. Thus, for the multidimensional balanced allocation problem, the objective is to 
minimize the maximum gap (across any dimension). We refer to the multidimensional ball as md-ball and 
the multidimensional bin as md-bin. 

In another variation of multidimensional balanced allocation the constraint of uniform distribution for 
populated entries is removed. Here again, each ball is a D dimensional 0- 1 vector and each ball has exactly 
/ populated dimensions, but these populated dimensions can have an arbitrary distribution. In the third 
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variation that is most general of the three, the number of populated dimensions, /, may be different across 
the balls, where / then is a random variable with an appropriate distribution. 

Each md-ball has / populated dimensions, where / could be constant across the balls or a random 
variable with a given distribution. Let, Si (t) denote the sum of the loads (minus corresponding dimension 
averages) across all D dimensions for the bin i at time t, expressed as Si{t) = Yld=i ^i- ^^^^ reduces 
the problem to that of the scalar weighted case. The IDEA algorithm works based on the sum of the 
dimensions for each bin. Also, for each choice of the bin, its estimated average is now incremented by ^. 

Theorem 3. For the multi-dimensional scenario, the IDEA algorithm provides a constant gap for uniform 
distribution of the f populated dimensions for each ball with m = n. 

Proof. Following the analysis in Section WA\ the Gap in the system is bounded by 0(1). Hence, the dif- 
ference of the number of balls in the maximum bin and the actual average of the system is constant. For 
m = n, the average is 1 and so the number of balls in the maximum bin is also a constant. Given a uniform 
distribution of the / populated dimensions of each ball over D, the Gap is bounded by 0(1). □ 

6.3 Parallel Case 

In this section we describe the algorithmic protocol to extend IDEA for the parallel balls-into-bins scenario. 
In the parallel scenario multiple balls are allocated to bins simultaneously in a single round. The remain 
balls are considered for allocation in the next round. This process is repeated until all the balls are allocated. 
Later in this section we will show that the proposed protocol ensures that the algorithm completes in a finite 
number of rounds. We consider that in any round, r, a bin may accept only one ball. 

Let X balls be simultaneously allocated in round r. We observe that the outcome of round r can be 
obtained by sequentially allocating x balls by IDEA. Hence any round in the parallel case can be replaced 
by a series of sequential processes of IDEA. Hence the gap remains constant even in the parallel case with 
IDEA. 



Algorithm 2: Communication Protocol 
Require: Number of bins (n). Number of choices per ball (d) 
Ensure: Parallel execution of IDEA 

Step 1. Each ball, Bi chooses d bins as candidates for allocation, and stores the choices as Mj. 

Step 2. Ball Bi queries its chosen bins (Mj) for the estimated gap. 

Step 3. The bins queries returns their estimated gap to the corresponding balls. 

Step 4. Ball Bi selects the bin bi with the lowest estimated gap among its chosen bins and sends a 

confirmation message, Clj. 

Step 5. A bin bj receiving a Clj message confirms allocation of ball Bi and sends it a message C2ij. If a 
bin receives multiple Clj messages, it arbitrarily selects one of them. 

Step 6. Ball Bi after receiving C2jj sends message INC to all its d chosen bins (M,) and commits to 
bin bj. 

Step 7. All the bins in Mi receiving INC message increments their estimated average by ^. 



The communication protocol, as given in Algorithm [2] ensures that there is no deadlock in the system 
and that each bin accepts at most one ball in each round. Since the allocation of a ball into a bin is done by 
two-way handshaking between the ball and the bin, a bin may receive multiple confirmations from the balls 
but will accept only one of them, and since each ball makes a single choice of the bin where it prefers to be 
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allocated, deadlock in the system is avoided. The update of the estimated average of the bins receiving the 
INC message is similar to that of the sequential IDEA with the use of sampling. 

We now prove that the algorithm terminates in finite number of rounds to guarantee a constant gap. 

Theorem 4. IDEA in the parallel scenario using the communication protocol described in Algorithm |2] 
provides a constant gap in expected 0(log log n) rounds. 

Proof. Since each round of the parallel case of IDEA can be simulated with multiple sequential processes 
of it, IDEA along with the communication protocol described above provides a constant gap. 

We observe that the execution of IDEA is identical to that of the ordinary d-choice algorithm except 
for the parameter on which the allocations of the balls are done. Hence Theorem 21 of Q stating that the 
Threshold(l) for parallel cases terminates after at most log log n + 0(1) steps, holds in our case as well. 
However, each ball will select a bin zero or negative estimated gap in 7 retries. Hence the total number 
of rounds taken by IDEA in the parallel setting will be given by 7 log log n. The expected value of 7 is 
a constant (Lemma |6ll. Hence the expected number of rounds for the algorithm to terminate is given by 
O (log log n). □ 

It can easily been observed that this protocol still provides a constant gap even for the heavily loaded 
case when m » n. 



7 Conclusions 

This paper proposes the Improved D-choice with Estimated Average, IDEA algorithm which w.h.p. pro- 
vides a perfectly balanced allocation for the sequential, online and uniform weighted balls-into-bins prob- 
lem. We propose a better metric for greedy placement of the balls using the estimated average of the system 
for each bin. We show that for a constant d choice and expected constant number of rounds per ball, the 
maximum loaded bin in IDEA is \m/n~\ + 0(1) w.h.p. This result holds for m = n case as well as the 
heavily loaded scenario where m » n. We also extends the solution for the general weighted case (with 
m » n) to show similar results for balls with weights taken from an arbitrary distribution with finite vari- 
ance and for the multi-dimensional case with m = n for uniform distribution of / populated dimensions 
over the D total dimensions. We also propose a communication protocol which in conjunction with IDEA 
provides a constant gap with expected 0(log log n) rounds. 
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A Sampling 



Allocation of balls-into-bins for a single choice procedure has a Poisson distribution approximately. We 
leverage this fact for the d choice scenario to show that the sampling done by the IDEA algorithm fairly 
accurately updates the estimated average of the bins w.h.p. 

Let A be the mean of the number of times a bin is chosen. Hence A = Also assume the sample size 
to be N. Define X to be the sum of the number of times the sampled bins to have been chosen. Since the 
number of times a bin is chosen is a random variable that follows Poisson's distribution (for a single choice 
process) and the choices of the bins are independent Poisson distributions each with mean A = dm/n, the 
characteristics of the sample of size N, also follows a Poisson distribution with mean A^A. We would like X 
to be bounded in the region /3A^A] w.h.p., where (5 is arbitrarily close to 1. Applying Chernoff 's bound 
we have, 

<X < NXI3) = 1 - {P{X < ^) + PiX > NXP)) (10) 

Given Poisson's tail bound, 



Substituting /3 = 1 + ^ for some large uj > 1, Eq. (fTTl) becomes equal to ( — — 1 ■ Approxi- 



NX 



mating to be less than 1 + + j;^ for small values of x, we observe that the above fraction is less than 1. 
Replacing the fraction with ^, where a > 1 and substituting it in Eq. (fTTl) with the expected value of A, we 
have. 



AT. 



PiX > iVA/3) < ( - ) (12) 



For m > n log n, Eq. (fT2]) becomes 



^ \ Nd 



a 



P(X > NXB) < — i = tt:? f« — .J, , [where c is a constant! 

^ - ' - \a^°Sn J ^ iog^\Nd ^Nd+c 

Hence, we observe that a constant number of samples suffices to guarantee high probability for bounding X 
within the factor of /3 when m > n log n. However when m < n log n, we need = log n samples for the 
same guarantee to hold. Similar results can thus be obtained for P{X < Hence Eq. (ITOl ) becomes, 

P{^ <X < NXp) = 1 - {P{X <^) + P{X > NXp)) > 1 - ^ 

Therefore, IDEA needs to sample constant or logn bins for the cases m < nlogn or m > nlogn 
respectively, for efficiently and accurately updating the estimated average of each bin to be close to that of 
the actual average of the system w.h.p. 

We also calculate the total number of samplings (amount of communication) done by the IDEA algo- 
rithm in the case m < n log n. On arrival of n balls, the expected number of times a bin is chosen is given 
by d. However, this is bounded by dlogn w.h.p. A bin will sample N other bins only when it is chosen 
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more than d times when n balls have been thrown. Using the Poisson's tail bound, in the general case when 
nk balls have been thrown (k G [1.. log n]) the probability of a bin being chosen (3X times (/3 > 1) is given 

by Pr(fc) = (^px'y'^k ^ ' where Afc is the expected number of times a bin is chosen when nk balls have been 
thrown. Hence, the expected number of total samplings, E[Samples\ done when total nlogn balls have 
been thrown is given by, 

log n 

E[Samples] = ^ nd Pr{k) < nd [By algebraic manipulations] 

fc=i 

Since, d is a constant, the expected number of samplings done by IDEA is 0(n) and the total communication 
done by IDEA is less than that in the naive case when d = log n. 



B Theoretical Framework for the Weighted Case 

In this section, we provide a theoretical proof of the constant gap performance of the weighted version of 
the IDEA algorithm. We follow the same proof sketch as in the case of ball with unit weight. Further, we 
too assume here m balls and n bins, m ^ n. 

Lemma 8. If each weighted ball chooses d bins i. u. r. out of n bins, each bin is chosen by ^ balls on 
expectation, and by at most ^ log n weighted balls with high probability. 

Proof. Similar to Proof of Lemma[T] □ 

Lemma 9. At any iteration, the estimated average of each bin is approximately equal to the current average 
w.h.p. 

Proof. We assume here that Z balls have already arrived and have been placed among the n bins. The 
number of balls that chose bin B among its d candidates is ^ on expectation, since each bin can be chosen 
by a ball with a probability of ^. The number of such balls is also bounded by (1 + logn)-^ with high 
probability (by Lemma [T]l. However, a bin does not increment its estimated average by more than d times 
when n balls are thrown. For each selection of bin B, it increases its estimated average by which is 
bounded by ^ < ^ < ^ dt^ ■ Hence the current value of yl^ is given by. 



Zd W* ±k Z(W*±k) , 
— • = , which IS the current average. 



Ab = — , 
n d 

Hence, the estimated average A of any bin is nearly equal to the actual average w.h.p. □ 
Lemma 10. The amortized sum of the estimated gap. Gap over all the bins is zero. 

Proof. Each ball chooses d candidate bins uniformly and randomly and is finally allocated to the bin having 
the lowest estimated gap. Hence for all the d chosen bins, their estimated average increases by W/d. The 
load of d — 1 bins which do not receive the ball remains same, and thus their estimated gap decreases by 
the above factor. However, for the bin in which the ball is placed, its load increases by 1 and its estimated 
gap increases by (l — Applying the arguments presented in the proof of Lemma [3] and Cor. [T] we 
observe that the sum of change of the estimated gap over the d chosen bins in any iteration isVF(l — ^) + 
{d — I)— J- = 0. Using similar analysis applied in the proof of Lemma [3] it can be shown that the sum of 
the estimated gap is zero by amortized analysis. □ 
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Corollary 4. The sum of the estimated gap over all bins is zero for arbitrary small mimber of balls allocated 
in the system. 

Proof. Similar to Proof of Corollary [T] □ 
Lemma 11. The number of bins having a zero or negative estimated gap, Gap is G(n). 

Proof. Using the arguments presented in the above lemmas, we provide a sketch of the proof below similar 
to that of Lemma m Let there be a bins with positive Gap, /3 bins with negative estimated gap, and 6 bins 
having estimated gap. Hence, a+f3+9 = n. We would like to establish a lower bound on (3+0. In order to 
have minimum number of bins with negative or zero Gap, the value of the gap should be minimum for bins 
with a positive gap and maximum for bins with a negative gap. The minimum positive estimated gap for a 

bin is ZWmin - \ J2f=i'^^ Wi ^ Z{W* ±k){l- ^) when Z{d - 1) balls have arrived in the system, of 
which only Z balls have been committed into the bin. We have Wmin = min{Wi, W2, ■ ■ ■ , Wz[d-i) }■ The 

maximum negative estimated average that a bin may have in this case is — ^'"^^ — — ^ _ z{d-\)(y/ ±k) ^ 
Hence, 

a.{Z{W* ± h)(\ - + ^ (_ ^(^* + 0.0 = [FromLemmaH 

.-. a = /3(d- 1) 

Further, a + /3 + ^ = ?^. Hence, d/3 + ^ = n. So, the number of bins with zero or negative Gap is 

Arguing similarly in the lines of Corollary [H we can claim that the gap is still 0(n) even when each 
round has /(n) = rf balls, where /(n) is not a constant. □ 

Lemma 12. Each ball chooses at least one bin having negative estimated gap among its d choices w.h.p. in 
7 rounds. 

Proof. Similar to Proof of Lemma |5l □ 

Lemma 13. The expected number of rounds, 7 per ball to find a bin with zero or negative estimated gap is 
constant. 

Proof. Similar to Proof of Lemma [6l □ 
Lemma 14. The load of each bin tends to its estimated average. 

Proof. Similar to Proof of Lemma|7J □ 
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